Nelson
Text Role Classification in Scientific Charts Using Multimodal Transformers
Kim, Hye Jin, Lell, Nicolas, Scherp, Ansgar
Text role classification involves classifying the semantic role of textual elements within scientific charts. For this task, we propose to finetune two pretrained multimodal document layout analysis models, LayoutLMv3 and UDOP, on chart datasets. The transformers utilize the three modalities of text, image, and layout as input. We further investigate whether data augmentation and balancing methods help the performance of the models. The models are evaluated on various chart datasets, and results show that LayoutLMv3 outperforms UDOP in all experiments. LayoutLMv3 achieves the highest F1-macro score of 82.87 on the ICPR22 test dataset, beating the best-performing model from the ICPR22 CHART-Infographics challenge. Moreover, the robustness of the models is tested on a synthetic noisy dataset ICPR22-N. Finally, the generalizability of the models is evaluated on three chart datasets, CHIME-R, DeGruyter, and EconBiz, for which we added labels for the text roles. Findings indicate that even in cases where there is limited training data, transformers can be used with the help of data augmentation and balancing methods. The source code and datasets are available on GitHub under https://github.com/hjkimk/text-role-classification
Improving Buoy Detection with Deep Transfer Learning for Mussel Farm Automation
McMillan, Carl, Zhao, Junhong, Xue, Bing, Vennell, Ross, Zhang, Mengjie
The aquaculture sector in New Zealand is experiencing rapid expansion, with a particular emphasis on mussel exports. As the demands of mussel farming operations continue to evolve, the integration of artificial intelligence and computer vision techniques, such as intelligent object detection, is emerging as an effective approach to enhance operational efficiency. This study delves into advancing buoy detection by leveraging deep learning methodologies for intelligent mussel farm monitoring and management. The primary objective centers on improving accuracy and robustness in detecting buoys across a spectrum of real-world scenarios. A diverse dataset sourced from mussel farms is captured and labeled for training, encompassing imagery taken from cameras mounted on both floating platforms and traversing vessels, capturing various lighting and weather conditions. To establish an effective deep learning model for buoy detection with a limited number of labeled data, we employ transfer learning techniques. This involves adapting a pre-trained object detection model to create a specialized deep learning buoy detection model. We explore different pre-trained models, including YOLO and its variants, alongside data diversity to investigate their effects on model performance. Our investigation demonstrates a significant enhancement in buoy detection performance through deep learning, accompanied by improved generalization across diverse weather conditions, highlighting the practical effectiveness of our approach.
Seeing the Fruit for the Leaves: Towards Automated Apple Fruitlet Thinning
Qureshi, Ans, Loh, Neville, Kwon, Young Min, Smith, David, Gee, Trevor, Bachelor, Oliver, McCulloch, Josh, Nejati, Mahla, Lim, JongYoon, Green, Richard, Ahn, Ho Seok, MacDonald, Bruce, Williams, Henry
Following a global trend, the lack of reliable access to skilled labour is causing critical issues for the effective management of apple orchards. One of the primary challenges is maintaining skilled human operators capable of making precise fruitlet thinning decisions. Thinning requires accurately measuring the true crop load for individual apple trees to provide optimal thinning decisions on an individual basis. A challenging task due to the dense foliage obscuring the fruitlets within the tree structure. This paper presents the initial design, implementation, and evaluation details of the vision system for an automatic apple fruitlet thinning robot to meet this need. The platform consists of a UR5 robotic arm and stereo cameras which enable it to look around the leaves to map the precise number and size of the fruitlets on the apple branches. We show that this platform can measure the fruitlet load on the apple tree to with 84% accuracy in a real-world commercial apple orchard while being 87% precise.